Spatial Correlation and Variography
2024-11-19
We have already seen different simple interpolation methods that take space into account when estimating unknown values, mainly by incorporating the distance between locations of measured values and the prediction location.
Next, we will look at geostatistic interpolation methods that also take into account knowledge about how certain variables are spatially (auto)correlated.
First, let´s recap some basic concepts of correlation.
\[ \newcommand{\E}{{\rm E}} % E expectation operator \newcommand{\Var}{{\rm Var}} % Var variance operator \newcommand{\Cov}{{\rm Cov}} % Cov covariance operator \newcommand{\Cor}{{\rm Corr}} \]
Random variables (RVs) are numeric variables whose outcomes are subject to chance.
The cumulative distribution of probability \(F_x(\cdot)\) over outcomes \(z\) over all possible values of the RV \(Z\) is the probability distribution function:
\[P(Z \le z) = F_Z(z) = \int_{-\infty}^z f_Z(u)du\] where \(f_Z(\cdot)\) is the probability density function of \(Z\). The sum of all probability is 1.
Random variables have
Try to think of \(E(Z)\) as \(\frac{1}{n}\sum_{i=1}^{n} z_i\), with \(i \rightarrow \infty\).
Two random variables \(X\) and \(Y\) have covariance defined as \(\Cov(X,Y) = E[(X-E(X))(Y-E(Y))]\)
Correlation is scaled covariance, scaled by the variances. For two variables \(X\) and \(Y\), it is \[\Cor(X,Y) = \frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\]
it is quite easy to show that \(|\Cov(X,Y)| \le \sqrt{\Var(X)\Var(Y)}\), so correlation ranges from -1 to 1
for this, note that \(\Cov(X,X)=\Var(X)\) and \(\Cov(X,-X)=-\Var(X)\).
It is perhaps easier to think of covariance as unscaled correlation.
Note: A large covariance does not imply a strong correlation
Random variable: \(Z\) follows a probability distribution, specified by a density function \(f(z)= \Pr(Z=z)\) or a distribution function \(F(z)=\Pr(Z \le z)\)
Expectation: \(\E(Z) = \int_{-\infty}^{\infty} f(s)ds\) – center of mass, mean.
Variance: \(\Var(Z)=\E(Z-\E(Z))^2\) – mean squared distance from mean; measure of spread; square root: standard deviation of \(Z\).
Covariance: \(\Cov(X,Y)=\E((X-\E(X))(Y-\E(Y)))\) – mean product; can be negative; \(\Cov(X,X)=\Var(X)\).
Correlation: \(r_{XY}=\frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\) – normalized \([-1,1]\) covariance. -1 or +1: perfect correlation.
Waldo Tobler’s first law in geography:
“Everything is related to everything else, but near things are more related than distant things.” [Tobler, 1970, p.236]*
TOBLER, W. R. (1970). “A computer model simulation of urban growth in the Detroit region”. Economic Geography, 46(2): 234-240.
Spatial correlation can be explored in different ways.
One way is to take up an idea from time series: look at lagged correlations, and the \(h\)-scatterplot.
What is it? Plots of (or correlation between) \(Z(s)\) and \(Z(s+h)\), where \(s+h\) is \(s\), shifted by \(h\) (time distance, spatial distance).
Another way to explore spatial correlation is to plot covariances of values at point pairs against the distance between these points.
Group into intervals
Look at means within intervals
Fit a line
In geostatistics the spatial correlation is modelled by the semivariogram instead of a covariogram (or correlogram). The term variogram is used synonymously with semivariogram. The (semi) variogram plots semivariance as a function of distance.
Covariance: \(\Cov(Z(s),Z(s+h)) = C(h) = \E[(Z(s)-m)(Z(s+h)-m)]\)
Semivariance: \(\gamma(h) = \frac{1}{2} \E[(Z(s)-Z(s+h))^2]\)
\[\E[(Z(s)-Z(s+h))^2] = \E[(Z(s))^2 + (Z(s+h))^2 -2Z(s)Z(s+h)]\]
Assume \(m=0\):
\[\E[(Z(s)-Z(s+h))^2] = \E[(Z(s))^2] + \E[(Z(s+h))^2] - 2\E[Z(s)Z(s+h)] \\ = 2\Var(Z(s)) - 2\Cov(Z(s),Z(s+h)) = 2C(0)-2C(h)\]
\(\gamma(h) = C(0)-C(h)\)
\(\gamma(h)\) is the semivariogram of \(Z(s)\).
\[\hat{\gamma}(\tilde{h})=\frac{1}{2N_h}\sum_{i=1}^{N_h}(Z(s_i)-Z(s_i+h))^2 \ \ h \in \tilde{h}\]
Group into intervals
Look at means within intervals
Fit a line to the empirical variogram
Covariance: \(\Cov(Z(s),Z(s+h)) = C(h) = \E[(Z(s)-m)(Z(s+h)-m)]\)
Semivariance:
\(\gamma(h) = \frac{1}{2} \E[(Z(s)-Z(s+h))^2]\)
\(\gamma(h)=C(0)-C(h)\)
Some processes are directionally dependent (anisotropic), i.e. do not have identical properties in all directions. When investigating such phenomena the semivariance does not only depend on the distance between two points but also on the direction of the distance vector.
In order to be able to estimate spatial correlation from observational data, we need to assume intrinsic stationarity.
This assumes the underlying process to be a random function composed of a mean and residual
\(Z(s) = m + e(s)\)
with a constant mean
\(E(Z(s)) = m\)
and a variogram defined as
\(\gamma(h)= \frac{1}{2}E(Z(s)-Z(s+h))^2\)
This imlplies that the variance of \(Z\) is constant, and the spatial correlation of \(Z\) does not depend on location \((s)\), but only on separation distance \((h)\).
Given a theoretical (co)variogram, we can create processes (random fields) that have the desired properties.
In the following, we create different example simulations that show for an (artificial) variable how different variogram properties are associated with different spatial distributions of the values of that variable.